Education Apps: Market Trends, Monetization, and Growth Opportunities¶

Introduction¶

The education app market has experienced rapid growth in recent years, driven by increased mobile device adoption, digital learning trends, and demand for accessible educational content. This analysis leverages a dataset of over 2 million apps to uncover key trends, revenue strategies, and growth opportunities within the education category.

Objectives of this analysis:

  1. Identify the distribution of education apps by type (free vs paid) and monetization strategy (ads, in-app purchases, freemium models).
  2. Explore trends in app downloads and user engagement to determine which strategies correlate with higher reach.
  3. Provide actionable insights for companies and app developers to optimize revenue, improve user acquisition, and prioritize app development focus areas.

Dataset Overview:

  • Size: 2,000,000+ apps
  • Features: app category, pricing model, average installs, ratings, revenue indicators, and more
  • Scope: Analysis focuses specifically on apps within the Education category

Key Value for Stakeholders:
By analyzing market patterns and monetization strategies, companies can make informed decisions about app development, marketing, and pricing, targeting segments with the highest growth and revenue potential.


Packages & Setup¶

We’ll use these packages for data cleaning, analysis, and visualization.

In [2]:
# Kindly upload the packages before starting :)
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px

Data Import¶

In [3]:
import os

def load_dataset(file_path):
    """Load CSV dataset with error handling."""
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"The file {file_path} was not found. Please check the path.")
    try:
        df = pd.read_csv(file_path)
        print(f"Dataset loaded successfully: {df.shape[0]} rows, {df.shape[1]} columns")
        return df
    except Exception as e:
        raise Exception(f"Error while loading dataset: {e}")
In [4]:
file_path = r"C:\Users\A\Desktop\playstore_app_market_insights\dataset\Google-Playstore.csv"
df = load_dataset(file_path)
Dataset loaded successfully: 2312944 rows, 24 columns

Dataset contains a comprehensive set of app features useful for revenue and user behavior analysis.


Data Cleaning & Transformation¶

In [5]:
def clean_dataset(df):
    
    # 1. Handle Missing Values
    df = df.dropna(subset=['App Name'])
    df['Rating'] = df.groupby('Category')['Rating'].transform(lambda x: x.fillna(x.median()))
    df['Released_missing'] = df['Released'].isna().astype(int)
    df['Released'] = df['Released'].fillna(df['Last Updated'])
    df['Developer Id'] = df['Developer Id'].fillna("N/A")
    df['max_inst_miss'] = df['Minimum Installs'].isna().astype(int)
    df['Minimum Installs'] = df['Minimum Installs'].fillna(df['Maximum Installs'])
    df['Currency'] = df['Currency'].fillna("N/A")

    # 2. Drop Useless Columns
    df = df.drop([
        'Developer Website', 'Developer Email', 'Privacy Policy', 'Scraped Time', 
        'App Id', 'Installs', 'Rating Count', 'Minimum Android'
    ], axis=1, errors='ignore')

    # 3. Normalize Size
    df["Size"] = df["Size"].astype(str).str.replace(",", "").str.replace(" ", "")
    def convert_size(value):
        try:
            val = str(value).strip()
            if val.lower() in {"varieswithdevice", "na", "n/a", ""}:
                return np.nan
            if val[-1].lower() == "m":
                return float(val[:-1]) * 1000
            elif val[-1].lower() == "k":
                return float(val[:-1])
            else:
                return float(val)
        except:
            return np.nan
    df["size"] = df["Size"].apply(convert_size)
    df = df.drop(['Size'], axis=1, errors='ignore')

    # 4. Convert Boolean to Int
    df['Free'] = df['Free'].astype(int)
    df['Ad Supported'] = df['Ad Supported'].astype(int)
    df['In App Purchases'] = df['In App Purchases'].astype(int)
    df['Editors Choice'] = df['Editors Choice'].astype(int)

    # 5. Derived Columns
    df['avg_installs'] = ((df['Minimum Installs'] + df['Maximum Installs']) / 2).round(0)
    df['Released'] = pd.to_datetime(df['Released'], errors='coerce')
    df['released_year'] = df['Released'].dt.year

    # 6. Rename Columns (snake_case)
    df = df.rename(columns={
        "App Name": "app_name",
        "Category": "category",
        "Rating": "rating",
        "Free": "app_status",
        "Currency": "currency",
        "Developer Id": "developer_name",
        "Released": "released_date",
        "Last Updated": "last_update",
        "Content Rating": "content_target",
        "Ad Supported": "ads_flag",
        "In App Purchases": "in_app_purchases_flag",
        "Editors Choice": "play_store_recommend"
    })

    # Ensure consistency between Price and app_status
    df.loc[df['Price'] > 0, 'app_status'] = 0  # Paid
    df.loc[df['Price'] == 0, 'app_status'] = 1  # Free

    # 7. Remove Duplicates
    df = df.drop_duplicates(['app_name'], keep='first')

    print(f"Cleaning complete: {df.shape[0]} rows, {df.shape[1]} columns remain.")
    return df
In [6]:
df = clean_dataset(df)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Rating'] = df.groupby('Category')['Rating'].transform(lambda x: x.fillna(x.median()))
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Released_missing'] = df['Released'].isna().astype(int)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Released'] = df['Released'].fillna(df['Last Updated'])
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Developer Id'] = df['Developer Id'].fillna("N/A")
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['max_inst_miss'] = df['Minimum Installs'].isna().astype(int)
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Minimum Installs'] = df['Minimum Installs'].fillna(df['Maximum Installs'])
C:\Users\A\AppData\Local\Temp\ipykernel_13356\3085676484.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Currency'] = df['Currency'].fillna("N/A")
Cleaning complete: 2177943 rows, 20 columns remain.

Data Validation Checks¶

Before running the analysis, we validate critical columns to ensure data integrity:

  • Ratings must be between 0 and 5.
  • Year formats must be valid.
  • All required columns must exist in the dataset.
In [8]:
# List of required columns
required_cols = [
    "app_name", "category", "rating", "released_date", 
    "last_update", "app_status", "ads_flag", 
    "in_app_purchases_flag", "play_store_recommend", "avg_installs"
]

# 1. Check for missing required columns
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
    print(" Missing columns:", missing_cols)
else:
    print(" All required columns are present.")

# 2. Validate rating values (should be between 0 and 5)
invalid_ratings = df[~df['rating'].between(0, 5, inclusive="both")]
print(f" Ratings valid: {len(invalid_ratings) == 0}")
if len(invalid_ratings) > 0:
    print(" Invalid ratings found:", invalid_ratings['rating'].unique())

# 3. Validate year formats for released_date
df['released_year'] = df['released_date'].dt.year
invalid_years = df[df['released_year'].isna()]
print(f" Year formats valid: {len(invalid_years) == 0}")
if len(invalid_years) > 0:
    print(" Invalid years detected in released_date.")
 All required columns are present.
 Ratings valid: True
 Year formats valid: True

Exploratory Data Analysis (EDA)¶

Google Play Apps Overview¶

In this section, we perform an initial exploration of the Google Play dataset to understand the overall app market, including:

  • Total number of apps and category distribution.
  • Pricing and monetization strategies (Free vs Paid, Ads/IAP)
  • Ratings and user engagement.
  • Installs and popularity metrics.

These insights will help identify market trends and guide strategic recommendations before focusing on a deep dive into Education apps.

General summary and counts

In [67]:
# 1. General summary
def analyze_total_apps(df):
    total_apps = len(df)
    print(f"The Total number of apps: {total_apps}")
    return total_apps


# 2. Category analysis
def analyze_categories(df, column = 'category'):
    counts = df[column].value_counts().head(10).reset_index()
    counts.columns = [column, 'Count']
    
    fig = px.bar(
        counts, 
        x= column, 
        y='Count',
        title=f"Top 10 Categories by Number of Apps",
        text='Count'
    )
    fig.update_traces(textposition='outside')
    fig.show()
In [68]:
analyze_total_apps(df)
The Total number of apps: 2177943
Out[68]:
2177943
In [69]:
# Category distribution
analyze_categories(df, column = 'category')

The dataset contains 2,177,943 apps, showing a very large and diverse market on Google Play.

The most populated categories are Education, Music & Audio, Business, Tools, Entertainment, and Lifestyle, indicating that these segments dominate the app ecosystem.

Education being the top category suggests strong user demand for learning and skill-building apps, while Music & Audio and Business show that entertainment and productivity remain major focuses for users.

Pricing and Monetization

In [70]:
# 3. Free vs Paid distribution

def plot_free_apps_financing(df):
    
    # Classify free apps
    free_apps = df[df['Price'] == 0].copy()
    free_apps['financing_type'] = 'Nothing'
    
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    counts = free_apps['financing_type'].value_counts()
    percent = (counts / counts.sum() * 100).round(2)
    
    # Plot pie chart
    fig = px.pie(
        names=counts.index,
        values=counts.values,
        title='Distribution of Free Apps by Financing Type',
        hole=0.3  # donut chart style
    )
    fig.show()


# Financing Trends
def plot_all_apps_trend(df):
    """
    Plots the trend of financing methods among all apps (Free and Paid) over the years.
    For paid apps, the financing type is considered 'Paid' since they are directly monetized.
    """
    
    apps = df.copy()

    # Ensure 'released_year' is numeric
    apps['released_year'] = pd.to_numeric(apps['released_year'], errors='coerce')

    # Classify financing type
    apps['financing_type'] = 'Nothing'
    
    # For Free apps
    free_mask = apps['Price'] == 0
    apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    apps.loc[free_mask & (apps['ads_flag'] == 1) & (apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    apps.loc[free_mask & (apps['ads_flag'] == 0) & (apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # For Paid apps
    apps.loc[apps['Price'] > 0, 'financing_type'] = 'Paid'

    # Group by year and financing type
    yearly_counts = apps.groupby(['released_year', 'financing_type']).size().reset_index(name='count')

    # Calculate percentage per year
    yearly_counts['percentage'] = yearly_counts.groupby('released_year')['count'].transform(lambda x: x / x.sum() * 100)

    # Plot trend
    fig = px.line(yearly_counts, x='released_year', y='percentage', color='financing_type',
                  markers=True,
                  title='Trend of Financing Methods Among All Apps Over Years',
                  labels={'percentage': 'Percentage of Apps', 'released_year': 'Year'})
    fig.show()
In [71]:
# Free vs Paid
plot_free_apps_financing(df)

Almost half of free apps (48.4%) fall under "Other," meaning they do not use Ads or In-App Purchases (IAP) as a monetization strategy. This may indicate apps that are entirely free with no direct revenue model, possibly relying on external funding or promotional purposes.
42.97% of free apps monetize using Ads only, which is the most common revenue strategy among monetized free apps.

Only 6.27% of apps use both Ads + IAP, suggesting that dual monetization is relatively rare but potentially more effective for revenue maximization.

2.33% of apps rely solely on IAP, indicating this is the least common approach for free apps.

In [72]:
plot_all_apps_trend(df)

Ratings & user engagement

In [73]:
# 4. Rating distribution
def analyze_rating_distribution(df):
    df_filtered = df[df['rating'] > 0 ]
    fig = px.histogram(df_filtered, x="rating", nbins=20, title="Ratings Distribution")
    fig.show()
In [74]:
analyze_rating_distribution(df)

Installs & popularity metrics

In [75]:
def get_category_installs(df):
    category_installs = df.groupby('category')['avg_installs'].mean().sort_values(ascending=False).head()
    category_installs = category_installs.round(0).reset_index()
    top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
    fig1 = px.bar(category_installs, x='category', y='avg_installs',
            title='Top Categories by Average Installs',
            labels={'avg_installs': 'Average Installs', 'category': 'Category'},
            text='avg_installs')
    fig1.show()
    # Top 3 apps
    top3 = df.sort_values(by='avg_installs', ascending=False).head(3)
    fig2 = px.bar(top3, x='app_name', y='avg_installs', color='category',
                  title='Top 3 Apps by Installs',
                  text='avg_installs')
    fig2.show()
In [76]:
get_category_installs(df)

Deep Dive: Education Apps¶

This section focuses on Education apps, with the goal of providing insights and recommendations directly relevant for XpertBot's Education app strategy.

Objectives:

  • Understand the Education app market, user engagement, and monetization trends.
  • Identify high-performing apps and successful strategies.
  • Provide actionable recommendations to improve XpertBot's app downloads, ratings, and revenue.

General summary & key metrics

In [77]:
def education_count(df):
    edu_app =  df[df['category'].isin(['Educational','Education']) ]
    edu_count = edu_app.shape[0]
    avg_installs = edu_app['avg_installs'].mean().round(0)
    subset_nonzero = edu_app[edu_app['rating'] != 0]
    avg_rating_edu = subset_nonzero['rating'].mean().round(2)  # no need to groupby, only Education category
    
    print("The number of education apps is:", edu_count)
    print("\nThe average installs of education apps is:", avg_installs)
    print("\nThe average rating of the education apps is:", avg_rating_edu)
In [78]:
education_count(df)
The number of education apps is: 248212

The average installs of education apps is: 56330.0

The average rating of the education apps is: 4.18

Education apps are numerous (~228k) with solid user engagement (avg. 33.8k installs) and a strong satisfaction level (avg. rating 4.19), indicating both high demand and generally positive user experience.

Pricing & Monetization

In [79]:
def education_free_paid_stats(df):
    # Filter Education apps
    edu_app =  df[df['category'].isin(['Educational','Education']) ]
    
    # Count of Free and Paid apps
    free_paid_count = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts()
    
    # Percentage of Free and Paid apps
    free_paid_percentage = edu_app['app_status'].map({1: 'Free', 0: 'Paid'}).value_counts(normalize=True) * 100
    
    # Return values if you want to reuse them
    return free_paid_count, free_paid_percentage

def plot_education_free_paid(free_paid_count, free_paid_percentage):
    # Prepare data
    df_plot = free_paid_count.reset_index()
    df_plot.columns = ['Status', 'Count']
    df_plot['Percentage'] = free_paid_percentage.values.round(2)
    
    # Bar chart
    fig = px.bar(df_plot, x='Status', y='Count', text='Percentage',
                 title='Free vs Paid Education Apps',
                 labels={'Count':'Number of Apps', 'Status':'App Status'})
    
    fig.update_traces(texttemplate='%{text}%', textposition='outside')
    fig.show()

def avg_paid_education_price(df):
    # Filter Education apps
    education_app =  df[df['category'].isin(['Educational','Education']) ]
    
    # Filter only paid apps
    paid_education_apps = education_app[education_app['Price'] > 0]
    
    # Compute average price
    avg_price = paid_education_apps['Price'].mean().round(3)
    
    # Print result
    print(f'The average price of paid Education apps is ${avg_price}')
    
    # Return value for reuse
    return avg_price
In [80]:
# Compute stats first
free_paid_count, free_paid_percentage = education_free_paid_stats(df)
plot_education_free_paid(free_paid_count, free_paid_percentage)
In [81]:
avg_price = avg_paid_education_price(df)
The average price of paid Education apps is $5.721

Paid Education apps are moderately priced on average (~$5.72), suggesting a low-cost barrier that aligns with accessibility and mass adoption strategies.

Installs & Revenue Metrics

In [82]:
# Define function
def revenue_summary(df):
    """Calculate total revenue for education apps vs all apps."""
    # Paid apps only
    paid_apps = df[(df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
    paid_apps['revenue'] = paid_apps['Price'] * paid_apps['avg_installs']
    
    # Education revenue
    edu_revenue = paid_apps[paid_apps['category'].isin(['Educational','Education'])]['revenue'].sum()
    total_revenue = paid_apps['revenue'].sum()
    edu_share = (edu_revenue / total_revenue) * 100
    
    print("Total Estimated Revenue Across All Paid Apps: ${:,.0f}".format(total_revenue))
    print("Education Apps Revenue: ${:,.0f}".format(edu_revenue))
    print("Education Share of Total Paid Revenue: {:.2f}%".format(edu_share))

    # Create dataframe of paid education apps
    edu_paid = df[(df['category'] == 'Education') & (df['Price'] > 0) & (df['avg_installs'] > 0)].copy()
    edu_paid['revenue'] = edu_paid['Price'] * edu_paid['avg_installs']
    
    return edu_revenue, total_revenue, edu_share, edu_paid
In [83]:
# Run function first to get edu_paid
edu_revenue, total_revenue, edu_share, edu_paid = revenue_summary(df)

# Format revenue with commas and round
top5_revenue_apps = edu_paid.sort_values(by="revenue", ascending=False).head(5).copy()
top5_revenue_apps['revenue'] = top5_revenue_apps['revenue'].apply(lambda x: f"${x:,.0f}")
top5_revenue_apps['avg_installs'] = top5_revenue_apps['avg_installs'].apply(lambda x: f"{x:,.0f}")

# Show as table
from IPython.display import display
display(top5_revenue_apps[['app_name', 'Price', 'avg_installs', 'revenue']])
Total Estimated Revenue Across All Paid Apps: $2,071,291,277
Education Apps Revenue: $116,620,919
Education Share of Total Paid Revenue: 5.63%
app_name Price avg_installs revenue
1610632 Driving Theory Test 4 in 1 Kit + Hazard Percep... 5.490000 659,096 $3,618,437
903078 Toca Lab: Elements 3.990000 646,984 $2,581,466
260116 Toca Life: City 3.990000 533,906 $2,130,285
809596 Driving school theory - Fahrlehrer24 20.015573 104,906 $2,099,754
861489 Official DVSA Theory Test Kit 5.490000 298,568 $1,639,138

Developers Analysis

In [84]:
def top_education_developers(df, top_n=10):
    # Filter Education apps
    education_app = df[df['category'] == 'Education']
    
    # Count apps per developer
    dev_by_app = education_app['developer_name'].value_counts().head(top_n)

    df_plot = dev_by_app.reset_index()
    df_plot.columns = ['Developer', 'Number of Apps']
    
    fig = px.bar(df_plot, x='Number of Apps', y='Developer', orientation='h',
                 title='Top 10 Education App Developers',
                 text='Number of Apps')
    
    fig.update_layout(yaxis={'categoryorder':'total ascending'})  # largest on top
    fig.show()
    
In [85]:
top_education_developers(df)

Financing models analysis

In [86]:
def iap_stats(df):
    # Overall apps with/without IAP
    app_with_iap = df['in_app_purchases_flag'].value_counts()
    p_app_with_iap = df['in_app_purchases_flag'].value_counts(normalize=True) * 100
    
    # Education apps
    education_app =  df[df['category'].isin(['Educational','Education']) ]
    edu_with_iap = education_app['in_app_purchases_flag'].value_counts()
    edu_with_iap_percentage = education_app['in_app_purchases_flag'].value_counts(normalize=True) * 100
    
    df_plot1 = edu_with_iap.reset_index()
    df_plot1.columns = ['IAP', 'Count']
    df_plot1['IAP'] = df_plot1['IAP'].map({0:'No IAP', 1:'Has IAP'})
    
    fig1 = px.pie(df_plot1, names='IAP', values='Count',
                  title='Education Apps: With vs Without IAP')
    fig1.show()


def ads_distribution_education(df):
    
    edu_apps =  df[df['category'].isin(['Educational','Education']) ]
    
    # Count ads vs no ads
    ads_counts = edu_apps['ads_flag'].map({1: "With Ads", 0: "No Ads"}).value_counts().reset_index()
    ads_counts.columns = ["Ads Status", "Count"]
    
    # Percentage
    ads_counts["Percentage"] = (ads_counts["Count"] / ads_counts["Count"].sum()) * 100
    
    # Plot interactive pie chart
    fig = px.pie(
        ads_counts,
        names="Ads Status",
        values="Count",
        title="Ads Distribution in Education Apps",
        hole=0.3
    )
    fig.show()


def plot_free_apps_financing(df):
    #filter education apps
    education_app = df[df['category'].isin(['Education','Educational'])]
    
    # Classify free apps
    free_apps = education_app[education_app['Price'] == 0].copy()
    free_apps['financing_type'] = 'Nothing'
    
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    free_apps.loc[(free_apps['ads_flag'] == 1) & (free_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    free_apps.loc[(free_apps['ads_flag'] == 0) & (free_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    counts = free_apps['financing_type'].value_counts()
    percent = (counts / counts.sum() * 100).round(2)

    paid_apps = education_app[education_app['Price'] == 1].copy()
    paid_apps['financing_type'] = 'Paid'

    paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'Ads + IAP'
    paid_apps.loc[(paid_apps['ads_flag'] == 1) & (paid_apps['in_app_purchases_flag'] == 0), 'financing_type'] = 'Ads only'
    paid_apps.loc[(paid_apps['ads_flag'] == 0) & (paid_apps['in_app_purchases_flag'] == 1), 'financing_type'] = 'IAP only'
    
    # Count and percentage
    paid_counts = paid_apps['financing_type'].value_counts()
    paid_percent = (paid_counts / paid_counts.sum() * 100).round(2)
    
    # Plot pie chart - free apps
    fig_1 = px.pie(
        names=counts.index,
        values=counts.values,
        title='Distribution of Free Education Apps by Financing Type',
        hole=0.3  # donut chart style
    )
    fig_1.show()

    # paid apps
    fig_2 = px.pie(
        names=paid_counts.index,
        values=paid_counts.values,
        title='Distribution of Paid Education Apps by Financing Type',
        hole=0.3  # donut chart style
    )
    fig_2.show()
In [87]:
iap_stats(df)

Only 7.6% of Education apps use in-app purchases (IAP). The vast majority (92.4%) rely on other revenue models or none at all.

In [88]:
ads_distribution_education(df)

Ads are a common but not dominant monetization strategy. Too many ads could hurt user experience.

In [66]:
plot_free_apps_financing(df)

The free segment is split: a majority simply offer free access (possibly funded externally), while others lean towards ads. Very few free apps use IAP, which suggests that monetizing educational content directly is less common.

Users who pay for Education apps expect an ad-free premium experience. Mixing ads into paid apps is rare and potentially risky.

Recommendations¶

The dominant and most accepted model in Education apps is Free (with optional ads or IAP).

Launching as a free app will align with user expectations and maximize reach.

Ads can be used moderately, but IAP (premium features, certificates, or advanced content) could be Xpertbot’s main monetization path, since apps combining Free + Ads + IAP tend to capture both installs and revenue streams (as we’ll confirm the final analysis).


Final Comparative Analysis¶

To better understand how Education apps position themselves in the wider Play Store ecosystem, we compare different financing strategies, adoption levels, and user ratings. This section evaluates the performance of free vs. paid models, the effectiveness of ads and in-app purchases, and the strategies used by top-performing apps. The goal is to highlight which approaches drive the highest installs and ratings, and to draw practical lessons for Xpertbot’s own education app.

Education apps & financing strategies

In [89]:
def top10_educational_apps(df):
    """Show top 10 education apps by installs and their financing strategy"""
    
    # Filter education apps
    edu_apps = df[df['category'].isin(['Educational','Education']) ]
    
    # Sort by installs
    top10_edu = edu_apps.sort_values(by="avg_installs", ascending=False).head(10)
    
    # Define financing strategy
    def financing(row):
        if row['app_status'] == 0:  # Paid app
            return "Paid App"
        elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Ads Only"
        elif row['in_app_purchases_flag'] == 1:
            return "IAP Only"
        else:
            return "Free (No Revenue)"
    
    # Apply strategy
    top10_edu["financing_strategy"] = top10_edu.apply(financing, axis=1)
    
    # Select relevant columns
    top10_edu = top10_edu[["app_name", "avg_installs", "financing_strategy", "category","rating"]]
    
    return top10_edu

def top_paid_edu_apps(df, n=10):
    # Filter only education + paid apps
    paid_edu = df[(df['category'].isin(['Education','Educational'])) & (df['app_status'] == 0)].copy()  # 0 = Paid
    
    # Define financing strategy
    def financing(row):
        if row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Paid + Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Paid + Ads"
        elif row['in_app_purchases_flag'] == 1:
            return "Paid + IAP "
        else:
            return "Paid only"
    
    paid_edu['financing_strategy'] = paid_edu.apply(financing, axis=1)
    
    # Sort by installs and select top N
    top_paid_edu = paid_edu.sort_values(by="avg_installs", ascending=False).head(n)
    
    return top_paid_edu[['app_name', 'avg_installs', 'Price', 'financing_strategy']]
In [90]:
top10_educational_apps(df)
Out[90]:
app_name avg_installs financing_strategy category rating
1050190 Duolingo: Learn Languages Free 180715565.0 Ads + IAP Education 4.6
122353 Google Classroom 156008640.0 Free (No Revenue) Education 2.6
1682537 Samsung Global Goals 146340788.0 Ads Only Education 4.5
336867 Toca Kitchen 2 126884584.0 Ads Only Educational 4.2
695655 Photomath 123678395.0 IAP Only Education 4.7
178286 U-Dictionary: Oxford Dictionary Free Now Trans... 112914354.0 Ads + IAP Education 4.5
2144949 Masha and the Bear. Educational Games 111844136.0 Ads + IAP Educational 4.1
388316 Truck games for kids - build a house, car wash 107404399.0 Ads + IAP Educational 4.1
1204946 Brainly – Home Learning & Homework Help 105322199.0 Free (No Revenue) Education 4.3
725710 Baby Panda's Supermarket 104487545.0 Ads + IAP Educational 4.3

Top-performing apps use a hybrid monetization strategy, not just ads or just IAP.

In [91]:
top_paid_edu_apps(df)
Out[91]:
app_name avg_installs Price financing_strategy
366246 Peppa Pig: Theme Park 1661773.0 2.99 Paid only
1304647 Peppa Pig: Sports Day 1314754.0 2.99 Paid only
1610632 Driving Theory Test 4 in 1 Kit + Hazard Percep... 659096.0 5.49 Paid + IAP
1422516 Teach Your Monster to Read: Phonics & Reading ... 650534.0 4.99 Paid only
903078 Toca Lab: Elements 646984.0 3.99 Paid + Ads
1284447 Calc Fast 554006.0 0.99 Paid only
263851 My Town : School 544722.0 2.99 Paid + IAP
260116 Toca Life: City 533906.0 3.99 Paid + Ads
392819 Speed Math 2018 - Pro 533785.0 0.99 Paid + Ads
831818 My City : After School 517785.0 2.99 Paid only

Paid apps can still succeed, but the market ceiling is much lower than free apps. Paid is more niche (parents buying games for kids, test prep apps, etc.).

In [92]:
def free_vs_paid_performance(df):
    """Compare installs, ratings, and financing strategies between free and paid education apps."""
    edu_apps = df[df['category'] == 'Education'].copy()
    
    # Exclude invalid ratings
    edu_apps = edu_apps[edu_apps['rating'] > 0]
    
    # Split datasets
    free_apps = edu_apps[edu_apps['app_status'] == 1]   # Free
    paid_apps = edu_apps[edu_apps['app_status'] == 0]   # Paid
    
    # --- Summary stats ---
    summary = pd.DataFrame({
        "Avg Installs": [free_apps['avg_installs'].mean(), paid_apps['avg_installs'].mean()],
        "Median Installs": [free_apps['avg_installs'].median(), paid_apps['avg_installs'].median()],
        "Avg Rating": [free_apps['rating'].mean(), paid_apps['rating'].mean()],
        "Median Rating": [free_apps['rating'].median(), paid_apps['rating'].median()],
        "App Count": [len(free_apps), len(paid_apps)]
    }, index=["Free", "Paid"])
    
    
    # --- Financing strategies ---
    def financing_breakdown(subset):
        no_financing = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 0)).sum()
        ads_only = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 0)).sum()
        iap_only = ((subset['ads_flag'] == 0) & (subset['in_app_purchases_flag'] == 1)).sum()
        both = ((subset['ads_flag'] == 1) & (subset['in_app_purchases_flag'] == 1)).sum()
        return pd.Series({
            "No Financing": no_financing,
            "Ads only": ads_only,
            "In-App Purchases only": iap_only,
            "Ads & IAP": both
        })
    
    financing = pd.DataFrame({
        "Free": financing_breakdown(free_apps),
        "Paid": financing_breakdown(paid_apps)
    }).T
    
    # --- Visualization: installs ---
    fig1 = px.box(
        edu_apps,
        x="app_status",
        y="avg_installs",
        title="Distribution of Installs: Free vs Paid Education Apps",
        labels={"app_status":"App Type (1=Free, 0=Paid)", "avg_installs":"Average Installs"},
        log_y=True
    )
    fig1.show()
In [93]:
free_vs_paid_performance(df)

Ratings are about the same (both ~4.2)

Users overwhelmingly prefer free apps. Paid apps are not better rated, so they don’t have a quality edge — they just limit adoption.

Financing strategy effectiveness

In [94]:
def financing_strategy_effectiveness(df):
    edu_apps = df[df['category'] == 'Education'].copy()
    edu_apps = edu_apps[edu_apps['rating'] > 0]  # drop invalid ratings
    
    # Define monetization type
    def get_strategy(row):
        if row['app_status'] == 0:  # Paid
            return "Paid"
        elif row['ads_flag'] == 1 and row['in_app_purchases_flag'] == 1:
            return "Free + Ads + IAP"
        elif row['ads_flag'] == 1:
            return "Free + Ads"
        elif row['in_app_purchases_flag'] == 1:
            return "Free + IAP"
        else:
            return "Free Only"
    
    edu_apps['monetization'] = edu_apps.apply(get_strategy, axis=1)
    
    # Summary stats
    summary = edu_apps.groupby('monetization').agg(
        Avg_Installs=('avg_installs', 'mean'),
        Median_Installs=('avg_installs', 'median'),
        Avg_Rating=('rating', 'mean'),
        Median_Rating=('rating', 'median'),
        App_Count=('app_name', 'count')
    ).sort_values(by='Avg_Installs', ascending=False)
    
    # Visualization: Installs
    fig1 = px.bar(
        summary.reset_index(),
        x="monetization", y="Avg_Installs",
        color="monetization",
        title="Average Installs by Financing Strategy (Education Apps)",
        log_y=True,
        labels={"Avg_Installs": "Average Installs (log scale)"}
    )
    fig1.show()
    
    # Visualization: Ratings
    fig2 = px.bar(
        summary.reset_index(),
        x="monetization", y="Avg_Rating",
        color="monetization",
        title="Average Ratings by Financing Strategy (Education Apps)",
        labels={"Avg_Rating": "Average Rating (0–5)"}
    )
    fig2.show()

    return summary
In [95]:
# Run
financing_summary = financing_strategy_effectiveness(df)

The most effective strategies are Free + IAP and Free + Ads + IAP. Ads-only apps underperform, while pure Free or Paid models miss out on monetization potential.


Recommendation for Xpertbot¶

Go Free at Launch: To gain traction, Xpertbot should launch its app for free.

Adopt a Freemium Model (Free + IAP, optionally Ads):

Offer core features for free, but lock advanced features, certifications, or premium content behind in-app purchases.

Ads can be included in the free version but must be limited to avoid hurting ratings.

Avoid Paid-only strategy: It drastically reduces adoption with no rating advantage.

Position Against Competitors: Apps like Duolingo and Photomath prove that Free + Ads + IAP is scalable, sustainable, and well-accepted by users.


*Best Strategy for Xpertbot:*

Adopt a Free + IAP (with optional ads) monetization model. Focus on strong user experience to secure high ratings, while gradually monetizing through advanced features or premium tiers.


Limitations & Next Steps¶

This analysis offers strong insights into education apps on the Play Store but has some limitations. The dataset is a snapshot in time and may not reflect the newest apps or removals. Some fields, such as installs, were reported in ranges and averaged, which can distort results, especially for very large apps. Missing values required imputation, and revenue was inferred from financing strategies rather than actual earnings, so results should be seen as indicative rather than exact.

For next steps, the analysis could be deepened by segmenting education apps into subcategories (e.g., language learning, test prep, kids’ games) and tracking trends over time. Benchmarking top competitors would reveal best practices, while analyzing user reviews could highlight needs and pain points. Building an interactive dashboard would give Xpertbot decision-makers a dynamic view of the market, and once the app is live, A/B testing different monetization models would confirm which strategies work best in practice.


In summary, the education app market is both promising and competitive. Free apps dominate adoption, while hybrid models (Free + IAP + Ads) drive the strongest performance. For Xpertbot, success will depend on offering a high-quality free app with thoughtful monetization through in-app purchases and, where appropriate, ads. Looking forward, continuous monitoring and A/B testing will help refine this strategy, ensuring sustainable growth and user satisfaction in the evolving education market.